146 research outputs found

    Cost-effective On-device Continual Learning over Memory Hierarchy with Miro

    Full text link
    Continual learning (CL) trains NN models incrementally from a continuous stream of tasks. To remember previously learned knowledge, prior studies store old samples over a memory hierarchy and replay them when new tasks arrive. Edge devices that adopt CL to preserve data privacy are typically energy-sensitive and thus require high model accuracy while not compromising energy efficiency, i.e., cost-effectiveness. Our work is the first to explore the design space of hierarchical memory replay-based CL to gain insights into achieving cost-effectiveness on edge devices. We present Miro, a novel system runtime that carefully integrates our insights into the CL framework by enabling it to dynamically configure the CL system based on resource states for the best cost-effectiveness. To reach this goal, Miro also performs online profiling on parameters with clear accuracy-energy trade-offs and adapts to optimal values with low overhead. Extensive evaluations show that Miro significantly outperforms baseline systems we build for comparison, consistently achieving higher cost-effectiveness.Comment: This paper is to be published in the 29th Annual International Conference on Mobile Computing and Networking (ACM MobiCom 23

    Redesigning spectroscopic sensors with programmable photonic circuits

    Full text link
    Optical spectroscopic sensors are a powerful tool to reveal light-matter interactions in many fields, such as physics, biology, chemistry, and astronomy. Miniaturizing the currently bulky spectrometers has become imperative for the wide range of applications that demand in situ or even in vitro characterization systems, a field that is growing rapidly. Benchtop spectrometers are capable of offering superior resolution and spectral range, but at the expense of requiring a large size. In this paper, we propose a novel method that redesigns spectroscopic sensors via the use of programmable photonic circuits. Drawing from compressive sensing theory, we start by investigating the most ideal sampling matrix for a reconstructive spectrometer and reveal that a sufficiently large number of sampling channels is a prerequisite for both fine resolution and low reconstruction error. This number is, however, still considerably smaller than that of the reconstructed spectral pixels, benefitting from the nature of reconstruction algorithms. We then show that the cascading of a few engineered MZI elements can be readily programmed to create an exponentially scalable number of such sampling spectral responses over an ultra-broad bandwidth, allowing for ultra-high resolution down to single-digit picometers without incurring additional hardware costs. Experimentally, we implement an on-chip spectrometer with a fully-programmable 6-stage cascaded MZI structure and demonstrate a 200 nm bandwidth using only 729 sampling channels. This achieves a bandwidth-to-resolution ratio of over 20,000, which is, to our best knowledge, about one order of magnitude greater than any reported miniaturized spectrometers to date. We further illustrate that by employing dispersion-engineered waveguide components, the device bandwidth can be extended to over 400 nm

    Random-LTD: Random and Layerwise Token Dropping Brings Efficient Training for Large-scale Transformers

    Full text link
    Large-scale transformer models have become the de-facto architectures for various machine learning applications, e.g., CV and NLP. However, those large models also introduce prohibitive training costs. To mitigate this issue, we propose a novel random and layerwise token dropping method (random-LTD), which skips the computation of a subset of the input tokens at all middle layers. Particularly, random-LTD achieves considerable speedups and comparable accuracy as the standard training baseline. Compared to other token dropping methods, random-LTD does not require (1) any importance score-based metrics, (2) any special token treatment (e.g., [CLS]), and (3) many layers in full sequence length training except the first and the last layers. Besides, a new LayerToken learning rate schedule is proposed for pretraining problems that resolve the heavy tuning requirement for our proposed training mechanism. Finally, we demonstrate that random-LTD can be applied to broader applications, including GPT and BERT pretraining as well as ViT and GPT finetuning tasks. Our results show that random-LTD can save about 33.3% theoretical compute cost and 25.6% wall-clock training time while achieving similar zero-shot evaluations on GPT-31.3B as compared to baseline.Comment: 22 page

    Simple transporter trafficking model for amphetamine-induced dopamine efflux

    Full text link
    Amphetamine and its derivatives are important drugs of abuse causing both short-term excitatory and long-term addictive effects. The short-term excitatory effects are linked to amphetamine's ability to maintain high levels of dopamine (DA) outside the cell both by inhibiting DA reuptake after synaptic transmission and by enhancing the efflux of DA from the dopaminergic cells. The molecular mechanisms by which amphetamine elicits the efflux of DA and similar monoamines are still unclear. Recent literature data suggest that trafficking of the monoamine transporters is a phenomenon that underlies observed changes in amphetamine-induced monoamine reuptake and efflux. We develop an ordinary differential equation model incorporating the diverse mechanistic details behind amphetamine-induced DA efflux and demonstrate its utility in describing our experimental data. We also demonstrate an experimental method to track the time-varying concentration of membrane-bound transporter molecules from the DA efflux data. The good fit between our model and the experimental data supports the hypothesis that amphetamine-induced transporter trafficking is necessary to produce extended efflux of DA. This model can explain the relative significance of different processes associated with DA efflux at different times and at different concentration ranges of amphetamine and DA. Synapse 61:500–514, 2007. © 2007 Wiley-Liss, Inc.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/56075/1/20390_ftp.pd

    Drinking from Both Glasses: Combining Pessimistic and Optimistic Tracking of Cross-Thread Dependences *

    Get PDF
    Abstract It is notoriously challenging to develop parallel software systems that are both scalable and correct. Runtime support for parallelism-such as multithreaded record & replay, data race detectors, transactional memory, and enforcement of stronger memory models-helps achieve these goals, but existing commodity solutions slow programs substantially in order to track (i.e., detect or control) an execution's cross-thread dependences accurately. Prior work tracks cross-thread dependences either "pessimistically," slowing every program access, or "optimistically," allowing for lightweight instrumentation of most accesses but dramatically slowing accesses involved in cross-thread dependences. This paper seeks to hybridize pessimistic and optimistic tracking, which is challenging because there exists a fundamental mismatch between pessimistic and optimistic tracking. We address this challenge based on insights about how dependence tracking and program synchronization interact, and introduce a novel approach called hybrid tracking. Hybrid tracking is suitable for building efficient runtime support, which we demonstrate by building hybridtracking-based versions of a dependence recorder and a region serializability enforcer. An adaptive, profile-based policy makes runtime decisions about switching between pessimistic and optimistic tracking. Our evaluation shows that hybrid tracking enables runtime support to overcome the performance limitations of both pessimistic and optimistic tracking alone
    • …
    corecore